Low Energy Instruction Memory Organization for Embedded Processors
نویسندگان
چکیده
mbedded systems are electronic systems that have permeated into many aspects of our lives. We can sense the presence of such systems in automobiles, house-hold appliances, consumer electronics and several others. In particular, demand for multimedia applications based embedded systems has been growing at an impressive rate. One of the distinguishing characteristics of such systems is that they are battery operated. For these batteries to long last the systems must be energy efficient. One of the principal component of such systems is a programmable processor, and in many cases it is often a Very Large Instruction Word (VLIW) processor. However, energy analysis of such processors indicate that within these processors, a significant amount of energy is consumed in instruction memories. In this thesis, we address the issue of improving energy efficiency in the instruction memory hierarchy of embedded processors targeted for multimedia applications. The solution to this problem as described in this dissertation, is leveraged upon two key factors. First, from the technological point of view, small and distributed memories can be energy efficient. Second, from the application point of view, significant part of execution time in multimedia applications is dominated by small loops. Motivated by these two key factors, we propose a clustered L0 buffer organization template at the microarchitectural abstraction level. Essentially the clustered organization is distributed loop buffer organization used exclusively for storing and executing instructions of loop kernels. Execution of regular loops, nested loops and loops with conditional constructs are supported in the clustered L0 buffer operation. Furthermore, we expose the L0 clusters to compiler (architecture) and software abstraction levels. At the architectural level, we present an L0 cluster generation tool, which generates an optimal L0 cluster configuration for a given schedule of an application. At the compiler level, we present an algorithm to perform operation scheduling and L0 cluster assignment for a given L0 cluster configuration. Through simulation results we show that instruction memory energy can be reduced by a factor of 6 compared to an organization with no loop buffers and by a factor of 3 compared to a centralized loop buffer organization.
منابع مشابه
Instruction Buffering Exploration for Low Energy Embedded Processors
For multimedia applications, loop buffering is an efficient mechanism to reduce the power in the instruction memory of embedded processors. Especially software controlled loop buffers are energy efficient. However current compilers do not fully take advantage of the possibilities of such loop buffers. This paper presents an algorithm the explore for an application or a set of applications what ...
متن کاملUltra-Low-Energy DSP Processor Design for Many-Core Parallel Applications
Background and Objectives: Digital signal processors are widely used in energy constrained applications in which battery lifetime is a critical concern. Accordingly, designing ultra-low-energy processors is a major concern. In this work and in the first step, we propose a sub-threshold DSP processor. Methods: As our baseline architecture, we use a modified version of an existing ultra-low-power...
متن کاملA Low Energy Clustered Instruction Memory Hierarchy for Long Instruction Word Processors
In the current embedded processors for media applications, up to 30% of the total processor power is consumed in the instruction memory hierarchy. In this context, we present an inherently low energy clustered instruction memory hierarchy template. Small instruction memories are distributed over groups of functional units and the interconnects are localized in order to minimize energy consumpti...
متن کاملInstruction Sets Mixed - Width
A pplications written for the embedded domain must perform under the constraints of limited memory and limited energy. While these constraints have always existed, current trends, such as mobile computing and ubiquitous computing, bring more and more complex applications to the embedded domain, making performance, or speed of execution, an important factor as well. For instance, we are now able...
متن کاملEnergy-Efficient Architecture for DP Local Sequence Alignment: Exploiting ILP and DLP
Typical approaches to solve Dynamic Programming algorithms explore data level parallelism by relying on specialized vector instructions. However, the fully-parallelizable scheme is often not compliant with the memory organization of general purpose processors, leading to a less optimal parallelism exploitation, with worse performance. The proposed processor architecture overcomes this issue by ...
متن کامل